-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support multiple time spines using new time spine configs #1348
Conversation
f0a326a
to
987a6ff
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable so far. I assume we will have more robust tests added as we get some models with custom time spines in place.
I left a bunch of inlines. None of them are merge blocking, but may as well clean up whatever you think needs cleaning up before hitting the button.
return ( | ||
self.source_nodes_for_metric_queries + self.source_nodes_for_group_by_item_queries + (self.time_spine_node,) | ||
) | ||
return self.source_nodes_for_metric_queries + self.source_nodes_for_group_by_item_queries |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I read below and I see why this is this way, but I think it's a bit weird - this is mainly an artifact of us expecting callers to wire in the time spine nodes as group by item source nodes.
I can't come up with anything obviously better, and it's probably outside the scope of this PR to do that refactor, but we should probably address this. I suspect it'll get confusing with things like custom calendar processing.
self._time_spine_source_nodes = { | ||
granularity: MetricTimeDimensionTransformNode.create( | ||
parent_node=ReadSqlSourceNode.create( | ||
data_set=data_set_converter.build_time_spine_source_data_set(time_spine_source) | ||
), | ||
aggregation_time_dimension_reference=TimeDimensionReference( | ||
element_name=time_spine_source.time_column_name | ||
), | ||
) | ||
for granularity, time_spine_source in TimeSpineSource.create_from_manifest( | ||
semantic_manifest_lookup.semantic_manifest | ||
).items() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style nit - I think this would be easier to read if it's building a dict via a for loop. That's a lot of function nesting inside a dict comprehension. It'd also be a much smaller change in the diff, since we could just indent the locals inside the loop and replace the assignment with an add to dict.
# Provides the time spine. | ||
time_spine_node: MetricTimeDimensionTransformNode | ||
# Provides the time spines. | ||
time_spine_nodes: Dict[TimeGranularity, MetricTimeDimensionTransformNode] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might consider typing this as a Mapping so it's not considered mutable.
assert ( | ||
agg_time_dimension_instances | ||
), "Building time spine dataset requires agg_time_dimension_instances, but none were found." | ||
smallest_agg_time_grain = min(dim.spec.time_granularity for dim in agg_time_dimension_instances) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, this works but it'll start failing with custom granularities. Good thing for me to note here - we'll need to be able to go from the agg time dimension instance to the time spine index granularity and check up on that.
if not compatible_time_spine_grains: | ||
raise RuntimeError( | ||
# TODO: update docs link when new docs are available | ||
f"No compatible time spine found. This query requires a time spine with granularity {smallest_agg_time_grain} or smaller. See docs to configure a new time spine: https://docs.getdbt.com/docs/build/metricflow-time-spine" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's include the smallest time spine granularity as well. Something like:
This query requires a time spine with granularity {smallest_agg_time_grain}, but the smallest available time spine granularity is {min(self._time_spine_sources.keys())}, which is too large.
time_spine_sources = { | ||
time_spine.primary_column.time_granularity: TimeSpineSource( | ||
schema_name=time_spine.node_relation.schema_name, | ||
table_name=time_spine.node_relation.relation_name, # is relation name the table name? double check |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
relation_name should be the table-or-view name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops I forgot about this comment 😅 thank you!
# backward compatibility. Ignore it if there is a new time spine config with the same granularity. | ||
legacy_time_spines = semantic_manifest.project_configuration.time_spine_table_configurations | ||
for legacy_time_spine in legacy_time_spines: | ||
if not time_spine_sources.get(legacy_time_spine.grain): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use in
instead of .get()
here:
if legacy_time_spine.grain not in time_spine_sources:
That way you don't get weird bugs with falsey hashable values.
@@ -14,7 +15,9 @@ def test_date_spine_date_range( # noqa: D103 | |||
simple_semantic_manifest_lookup: SemanticManifestLookup, | |||
create_source_tables: None, | |||
) -> None: | |||
time_spine_source = TimeSpineSource.create_from_manifest(simple_semantic_manifest_lookup.semantic_manifest) | |||
time_spine_source = TimeSpineSource.create_from_manifest(simple_semantic_manifest_lookup.semantic_manifest)[ | |||
TimeGranularity.DAY |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh ok. Technically this is the legacy time spine fallback, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, honestly, I'm a bit confused about why this test is here. I guess it's essentially testing that the legacy time spine model we have documented somewhere has the values we expect. But this feels like it should live in core or something, not here, since it's not actually related to MF functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually it looks like its just testing the test data we have in tests_metricflow/fixtures/source_table_snapshots/time_spine_table/mf_time_spine.yaml
! I guess that might be useful to keep people from changing it? ¯_(ツ)_/¯
Completes SL-2313
Consume new time spine configs and add support for using multiple time spines at different granularities.